1. Goal

This tutorial explains how to do some very basic spatial analysis with polygon and point data. The exercise focuses on counting the occurences of points within polygons and plotting the results as a heatmap using the leaflet package. To that end we’ll follow these steps:

For a more in-depth tutorial on GIS and Spatial Analysis in R please see https://mgimond.github.io/Spatial/index.html.

2. Data sources

Both shapefiles were downloaded from the NYC Open data:

3. Shapefile data

Before we being set your working directory:

setwd("/Users/ola/Desktop/EDAV_Robbins/community/")

Shapefile format is a popular geospatial vector data format for GIS (geographic information system) software (from https://en.wikipedia.org/wiki/Shapefile) The information is stored in several files with different extensions, some of which are mandatory:

Others, like .prj (a plain text file with the coordinate system and projection information), are optional.

rgdal is R’s interface to the popular C/C++ spatial data processing library gdal and is useful for handling shapefiles (https://cran.r-project.org/web/packages/rgdal/rgdal.pdf).

Using the function rgdal::ogrInfo we can get some info regarding the shapefiles, such as:

the 1st argument of the rgdal::ogrInfo is the name of the directory in the working directory the shapefile is in and the 2nd argument is the name of the file without the extension.

library(rgdal)
rgdal::ogrInfo("ZIP_CODE_040114", "ZIP_CODE_040114")
## Source: "ZIP_CODE_040114", layer: "ZIP_CODE_040114"
## Driver: ESRI Shapefile; number of rows: 263 
## Feature type: wkbPolygon with 2 dimensions
## Extent: (913129 120020.9) - (1067494 272710.9)
## CRS: +proj=lcc +lat_1=40.66666666666666 +lat_2=41.03333333333333 +lat_0=40.16666666666666 +lon_0=-74 +x_0=300000 +y_0=0 +datum=NAD83 +units=us-ft +no_defs  
## LDID: 87 
## Number of fields: 12 
##          name type length typeName
## 1     ZIPCODE    4      5   String
## 2     BLDGZIP    4      1   String
## 3     PO_NAME    4     28   String
## 4  POPULATION    2     19     Real
## 5        AREA    2     19     Real
## 6       STATE    4      2   String
## 7      COUNTY    4     20   String
## 8     ST_FIPS    4      2   String
## 9    CTY_FIPS    4      3   String
## 10        URL    4    200   String
## 11 SHAPE_AREA    2     19     Real
## 12  SHAPE_LEN    2     19     Real
rgdal::ogrInfo("Subway_Entrances", "geo_export_f1302dae-758c-45a4-9f89-a6d045604655")
## Source: "Subway_Entrances", layer: "geo_export_f1302dae-758c-45a4-9f89-a6d045604655"
## Driver: ESRI Shapefile; number of rows: 1928 
## Feature type: wkbPoint with 2 dimensions
## Extent: (-74.25283 40.51211) - (-73.75418 40.9036)
## CRS: +proj=longlat +ellps=WGS84 +no_defs  
## LDID: 0 
## Number of fields: 4 
##       name type length typeName
## 1 objectid    2     33     Real
## 2      url    4    254   String
## 3     name    4    254   String
## 4     line    4    254   String

Read shapefiles into spatial dataframe objects using rgdal::readOGR (arguments analogical to rgdal::ogrInfo).

zip_polygons <- rgdal::readOGR("ZIP_CODE_040114", "ZIP_CODE_040114")
## OGR data source with driver: ESRI Shapefile 
## Source: "ZIP_CODE_040114", layer: "ZIP_CODE_040114"
## with 263 features
## It has 12 fields
subway_entrances_points <- rgdal::readOGR("Subway_Entrances", "geo_export_f1302dae-758c-45a4-9f89-a6d045604655")
## OGR data source with driver: ESRI Shapefile 
## Source: "Subway_Entrances", layer: "geo_export_f1302dae-758c-45a4-9f89-a6d045604655"
## with 1928 features
## It has 4 fields

Note the type of the resulting objects (SpatialPolygonsDataFrame and SpatialPointsDataFrame):

class(zip_polygons)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
class(subway_entrances_points)
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"

We can access the data (all but the latitude and longitude info), which is stored in the data slot using @data:

head(zip_polygons@data)
##   ZIPCODE BLDGZIP  PO_NAME POPULATION     AREA STATE COUNTY ST_FIPS
## 0   11436       0  Jamaica      18681 22699295    NY Queens      36
## 1   11213       0 Brooklyn      62426 29631004    NY  Kings      36
## 2   11212       0 Brooklyn      83866 41972104    NY  Kings      36
## 3   11225       0 Brooklyn      56527 23698630    NY  Kings      36
## 4   11218       0 Brooklyn      72280 36868799    NY  Kings      36
## 5   11226       0 Brooklyn     106132 39408598    NY  Kings      36
##   CTY_FIPS                  URL SHAPE_AREA SHAPE_LEN
## 0      081 http://www.usps.com/          0         0
## 1      047 http://www.usps.com/          0         0
## 2      047 http://www.usps.com/          0         0
## 3      047 http://www.usps.com/          0         0
## 4      047 http://www.usps.com/          0         0
## 5      047 http://www.usps.com/          0         0
head(subway_entrances_points@data)
##   objectid                               url
## 1     1734 http://web.mta.info/nyct/service/
## 2     1735 http://web.mta.info/nyct/service/
## 3     1736 http://web.mta.info/nyct/service/
## 4     1737 http://web.mta.info/nyct/service/
## 5     1738 http://web.mta.info/nyct/service/
## 6     1739 http://web.mta.info/nyct/service/
##                                      name line
## 1 Birchall Ave & Sagamore St at NW corner  2-5
## 2 Birchall Ave & Sagamore St at NE corner  2-5
## 3 Morris Park Ave & 180th St at NW corner  2-5
## 4 Morris Park Ave & 180th St at NW corner  2-5
## 5       Boston Rd & 178th St at SW corner  2-5
## 6  Boston Rd & E Tremont Ave at NW corner  2-5

Latitude and longitude data for each polygon is stored in polygons slot, so, for istance, we can access the latitude and longitude data of the 1st polygon using:

head(zip_polygons@polygons[[1]]@Polygons[[1]]@coords)
##         [,1]     [,2]
## [1,] 1038098 188138.4
## [2,] 1038142 188154.8
## [3,] 1038171 188165.8
## [4,] 1038280 188206.7
## [5,] 1038521 188296.8
## [6,] 1038764 188388.4

4. Projection

Note the extent of the latitude and longitued data for polygons as outputted by rgdal::ogrInfo: ## Extent: (913129 120020.9) - (1067494 272710.9). This doesn’t look like the latitude and longitude of NYC we are used to (latitude ~40N or +40, longitude ~74W or -74). In contrast, the extent for the points file is more what we expect: ## Extent: (-74.25283 40.51211) - (-73.75418 40.9036).

This is because the projection of the polygon shapefile is different - it is not the the usual WGS (World Geodetic System) projection, as it is in the case of the points data. To be able to join the two files spatially and also to plot them using the leaflet package we will tarnsform the projection of the polygon data using sp::spTransform.

sp is R’s Classes and Methods for Spatial Data (https://cran.r-project.org/web/packages/sp/sp.pdf). sp::spTransform takes the spatial dataframe object to be transformed as the 1st argument and the target coordinate reference system object as the 2nd argument (using the class CRS). We project the polygon file to the target CRS equal to that of the points file, which we can obtain using rgdal::ogrInfo: ## CRS: +proj=longlat +ellps=WGS84 +no_defs.

library(sp)
zip_polygons_projected <- sp::spTransform(zip_polygons, CRS("+proj=longlat +ellps=WGS84 +no_defs"))
head(zip_polygons_projected@polygons[[1]]@Polygons[[1]]@coords)
##           [,1]     [,2]
## [1,] -73.80585 40.68291
## [2,] -73.80569 40.68295
## [3,] -73.80558 40.68298
## [4,] -73.80519 40.68310
## [5,] -73.80432 40.68334
## [6,] -73.80345 40.68359

Note that projection and the extent of the projected file is the same as the points file. We can access this info directly using functions from package raster, which is another R’s geographic data analysis and modeling package (https://cran.r-project.org/web/packages/raster/raster.pdf).

library(raster)
raster::projection(zip_polygons_projected)
## [1] "+proj=longlat +ellps=WGS84 +no_defs"
raster::projection(subway_entrances_points)
## [1] "+proj=longlat +ellps=WGS84 +no_defs"
raster::extent(zip_polygons_projected)
## class       : Extent 
## xmin        : -74.25576 
## xmax        : -73.6996 
## ymin        : 40.49584 
## ymax        : 40.91517
raster::extent(subway_entrances_points)
## class       : Extent 
## xmin        : -74.25283 
## xmax        : -73.75418 
## ymin        : 40.51211 
## ymax        : 40.9036

5. Visualize shapes

Let’s quickly build a default leaflet map with just two layers (addPolygons for polygons and addCircles for points) to visualize the shapes and make sure thet plot in the same geographical area:

library(leaflet)

initial_map <- leaflet() %>%
  
  # polygons
  leaflet::addPolygons(data = zip_polygons_projected) %>%

  # points
  leaflet::addCircles(data = subway_entrances_points)

initial_map

In the map above we can see that the polygons form the familiar shape of the five boroughs and that the points lie within the polygons. Good, this is what we expected.

Leaflet allows for some interaction with the produced map. In the map above it is possible to zoom in and out to see more detail.

5. Spatial join

Now we use sp::over function to perform a spatial join of the points dataframe with the polygons dataframe. sp::over takes the point dataframe as the 1st argument and the polygons dataframe as the 2nd. It creates a list of all points with the polygons they fall inside. Note the number of rows of this dataframe is exactly the same as of the subway_entrances_points dataframe (1928) and the columns are exacly like in the zip_polygons_projected dataframe. This is because sp::over creates one entry for each point in the new dataframe with all features equal to the features of the polygon that points is inside.

point_in_polygon <- sp::over(subway_entrances_points, zip_polygons_projected)
class(point_in_polygon)
## [1] "data.frame"
nrow(point_in_polygon)
## [1] 1928
head(point_in_polygon)
##   ZIPCODE BLDGZIP PO_NAME POPULATION     AREA STATE COUNTY ST_FIPS
## 1   10462       0   Bronx      75674 53022514    NY  Bronx      36
## 2   10462       0   Bronx      75674 53022514    NY  Bronx      36
## 3   10460       0   Bronx      56670 35155667    NY  Bronx      36
## 4   10460       0   Bronx      56670 35155667    NY  Bronx      36
## 5   10460       0   Bronx      56670 35155667    NY  Bronx      36
## 6   10460       0   Bronx      56670 35155667    NY  Bronx      36
##   CTY_FIPS                  URL SHAPE_AREA SHAPE_LEN
## 1      005 http://www.usps.com/          0         0
## 2      005 http://www.usps.com/          0         0
## 3      005 http://www.usps.com/          0         0
## 4      005 http://www.usps.com/          0         0
## 5      005 http://www.usps.com/          0         0
## 6      005 http://www.usps.com/          0         0

We turn that dataframe into a dataframe with columnns: ZIPCODE and Freq using the table function on the ZIPCODE column of point_in_polygon.

zip_point_count <- as.data.frame(table(point_in_polygon$ZIPCODE))
head(zip_point_count)
##    Var1 Freq
## 1 00083    5
## 2 10001   50
## 3 10002   17
## 4 10003   28
## 5 10004   20
## 6 10005   14

Since table automatically assigns names to columns, we change the name of the column with zipcodes (Var1) so that it is the same as the name of the column in the zip_polygons_projected dataframe. This is necessary so we can merge these dataframes.

names(zip_point_count)[names(zip_point_count)=="Var1"] <- "ZIPCODE"
head(zip_point_count)
##   ZIPCODE Freq
## 1   00083    5
## 2   10001   50
## 3   10002   17
## 4   10003   28
## 5   10004   20
## 6   10005   14

Perform a left-merge (all.x = TRUE) on these dataframes, effectively adding a column with the count of subway entrances to the zip_polygons_projected dataframe. Do the join on the ZIPCODE column (by = "ZIPCODE") - that’s why we had to rename it in the previous step.

zip_polygons_final <- merge(x = zip_polygons_projected, y = zip_point_count, by = "ZIPCODE", all.x = TRUE)

head(zip_polygons_final)
##     ZIPCODE BLDGZIP  PO_NAME POPULATION     AREA STATE COUNTY ST_FIPS
## 254   11436       0  Jamaica      18681 22699295    NY Queens      36
## 176   11213       0 Brooklyn      62426 29631004    NY  Kings      36
## 175   11212       0 Brooklyn      83866 41972104    NY  Kings      36
## 188   11225       0 Brooklyn      56527 23698630    NY  Kings      36
## 181   11218       0 Brooklyn      72280 36868799    NY  Kings      36
## 189   11226       0 Brooklyn     106132 39408598    NY  Kings      36
##     CTY_FIPS                  URL SHAPE_AREA SHAPE_LEN Freq
## 254      081 http://www.usps.com/          0         0    0
## 176      047 http://www.usps.com/          0         0   12
## 175      047 http://www.usps.com/          0         0   10
## 188      047 http://www.usps.com/          0         0   14
## 181      047 http://www.usps.com/          0         0   14
## 189      047 http://www.usps.com/          0         0   19

The spatial join is now complete and we can start plotting the data.

It should be noted that there are other ways to perform a spatial join in R that e.g. retain the data for both joined datasets (http://www.nickeubank.com/wp-content/uploads/2015/10/RGIS2_MergingSpatialData_part1_Joins.html). However, since in this exerciswe we are interested only in the count of points within each polygon, sp::over function is sufficient.

6. Make a map

Now onto plotting our dataframes using leaflet. leaflet allows to create interactive web maps with JavaScript (https://cran.r-project.org/web/packages/leaflet/leaflet.pdf). There is plenty of tuorials already existing for leaflet, such as https://rstudio.github.io/leaflet/. Here I’m only going to get into very basic functionality that is needed to complete the exercise.

6.1 Color palette

Set palette of colors for polygons - they are to be colored by the count of subway entrances (Freq column of zip_polygons_final) from yellow (low count) to red (high count). We use colorBin function of the leaflet package because the data is numeric and discrete.

library(leaflet)
palette <- leaflet::colorBin("YlOrRd", domain = zip_polygons_final$Freq)

6.2 Labels for points and polygons

Set labels of the polygons using html markup (htmltools package). Label string will have 3 lines derived from the zip_polygons_final data frame:

library(htmltools)
poly_labels <- sprintf("<strong>ZIP: %s</strong><br/>%s<br/>Num. entrances: %s",
                      zip_polygons_final$ZIPCODE, zip_polygons_final$PO_NAME, zip_polygons_final$Freq) %>%
  lapply(htmltools::HTML)

Set labels of the points. Label string will have 2 lines derived from the subway_entrances_points data frame:

point_labels <- sprintf("Line: %s<br/>Name: %s",
                       subway_entrances_points$line, subway_entrances_points$name) %>%
  lapply(htmltools::HTML)

6.3 Map opbject

Create a map object using leaflet package with the following layers:

myNYCmap <- leaflet() %>%
  # base map
  leaflet::addProviderTiles(providers$CartoDB.Positron) %>%
  
  # polygons
  leaflet::addPolygons(data = zip_polygons_final,
              color = "#444444",
              weight = 1,
              smoothFactor = 0.5,
              opacity = 1.0,
              fillOpacity = 0.5,
              fillColor = ~palette(Freq),
              # change display properties while mouse cursor hovers on the polygon
              highlightOptions = highlightOptions(color = "white", 
                                                  weight = 2, 
                                                  bringToFront = TRUE),
              label = poly_labels,
              # label display options
              labelOptions = labelOptions(style = list("font-weight" = "normal", padding = "3px 8px"),
                                          textsize = "15px",
                                          direction = "auto")) %>%
  # point markers
  leaflet::addMarkers(data = subway_entrances_points,
                   label = point_labels) %>%
  # legend
  leaflet::addLegend(pal = palette,
            values = zip_polygons_final$Freq,
            opacity = 0.7,
            title = "Num. entrances",
            position = "topright")

6.5 Display the map

myNYCmap

The above map features the following functionalities:

You can also save your map using htmlwidgets package.

library(htmlwidgets)
htmlwidgets::saveWidget(myNYCmap, 'myNYCmap.html', selfcontained = TRUE)

In case you are annoyed with the size and the density of the markers in the previous map, you can also plot the points using leaflet::addCircles. This option, however, doesn’t allow for labels of the points appearing on mouse-over.

myNYCmap_circles <- leaflet() %>%
  # base map
  leaflet::addProviderTiles(providers$CartoDB.Positron) %>%
  
  # polygons
  leaflet::addPolygons(data = zip_polygons_final,
              color = "#444444",
              weight = 1,
              smoothFactor = 0.5,
              opacity = 1.0,
              fillOpacity = 0.5,
              fillColor = ~palette(Freq),
              # change display properties while mouse cursor hovers on the polygon
              highlightOptions = highlightOptions(color = "white", 
                                                  weight = 2, 
                                                  bringToFront = TRUE),
              label = poly_labels,
              # label display options
              labelOptions = labelOptions(style = list("font-weight" = "normal", padding = "3px 8px"),
                                          textsize = "15px",
                                          direction = "auto")) %>%
  # point markers
  leaflet::addCircles(data = subway_entrances_points) %>%
  
  # legend
  leaflet::addLegend(pal = palette,
            values = zip_polygons_final$Freq,
            opacity = 0.7,
            title = "Num. entrances",
            position = "topright")

htmlwidgets::saveWidget(myNYCmap_circles, 'myNYCmap_circles.html', selfcontained = TRUE)

myNYCmap_circles

References:

Data sources:

https://data.cityofnewyork.us/Business/Zip-Code-Boundaries/i8iw-xf4u/data

https://data.cityofnewyork.us/Transportation/Subway-Entrances/drex-xx56/data

Packages:

https://cran.r-project.org/web/packages/rgdal/rgdal.pdf

https://cran.r-project.org/web/packages/sp/sp.pdf

https://cran.r-project.org/web/packages/raster/raster.pdf

https://cran.r-project.org/web/packages/leaflet/leaflet.pdf

Tutorials:

http://www.nickeubank.com/wp-content/uploads/2015/10/RGIS2_MergingSpatialData_part1_Joins.html

https://rstudio.github.io/leaflet/

https://mgimond.github.io/Spatial/index.html

Other:

https://en.wikipedia.org/wiki/Shapefile